首页> 外文OA文献 >CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
【2h】

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

机译:CUsBoost:基于群集的欠采样,具有不平衡的提升   分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Class imbalance classification is a challenging research problem in datamining and machine learning, as most of the real-life datasets are oftenimbalanced in nature. Existing learning algorithms maximise the classificationaccuracy by correctly classifying the majority class, but misclassify theminority class. However, the minority class instances are representing theconcept with greater interest than the majority class instances in real-lifeapplications. Recently, several techniques based on sampling methods(under-sampling of the majority class and over-sampling the minority class),cost-sensitive learning methods, and ensemble learning have been used in theliterature for classifying imbalanced datasets. In this paper, we introduce anew clustering-based under-sampling approach with boosting (AdaBoost)algorithm, called CUSBoost, for effective imbalanced classification. Theproposed algorithm provides an alternative to RUSBoost (random under-samplingwith AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost)algorithms. We evaluated the performance of CUSBoost algorithm with thestate-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost,SMOTEBoost on 13 imbalance binary and multi-class datasets with variousimbalance ratios. The experimental results show that the CUSBoost is apromising and effective approach for dealing with highly imbalanced datasets.
机译:类不平衡分类是数据挖掘和机器学习中一个具有挑战性的研究问题,因为大多数现实生活中的数据集在本质上常常是不平衡的。现有的学习算法通过正确地对多数类进行分类而使分类准确性最大化,但是对少数类进行了错误分类。但是,在实际应用中,少数类实例比多数类实例更感兴趣。近来,在文学中已经使用了几种基于采样方法的技术(多数类的欠采样和少数类的过度采样),成本敏感型学习方法和集成学习来对不平衡数据集进行分类。在本文中,我们介绍了一种新的基于群集的欠采样方法,该方法具有增强的(AdaBoost)算法,称为CUSBoost,用于有效的不平衡分类。所提出的算法提供了RUSBoost(使用AdaBoost进行随机欠采样)和SMOTEBoost(使用AdaBoost进行综合少数采样)算法的替代方法。我们使用基于整体学习的最新方法,如AdaBoost,RUSBoost,SMOTEBoost,对13种具有各种失衡比的不平衡二进制和多类数据集进行了评估,从而评估了CUSBoost算法的性能。实验结果表明,CUSBoost是处理高度不平衡数据集的有希望和有效的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号